Abstract—Many stand-alone, FPGA-based accelerators sepa-rate the implementation of a computation into two components – (1) a large parallel component that is realized as hardware on spatial FPGA fabric and (2) a small control and co-ordination component that is realized as software on embedded soft-core processors like an off-the-shelf Xilinx Microblaze (or host offchip CPU). While this hardware-software partitioning methodology allows the designer to lower design effort when composing the accelerator system, it introduces unnecessary Amdahl’s Law bottlenecks and limits scalability. In this paper, we show how to avoid these limitations with VLIW-SCORE: a combination of a high-level parallel programming framework called SCORE and a custom, h...
FPGA overlays have shown the potential to improve designers’ productivity through balancing flexibil...
In this article, we present experiences implementing a general Parallel Discrete Event Simulation (P...
Automated code generation and performance tuning tech-niques for concurrent architectures such as GP...
Many stand-alone, FPGA-based accelerators separate the implementation of a computation into two comp...
Spatial processing of sparse, irregular, double-precision floating-point computation using a single ...
Abstract—Single-FPGA spatial implementations can provide an order of magnitude speedup over sequenti...
ii Spatial processing of sparse, irregular floating-point computation using a single FPGA enables up...
Platform Multicore Processor, Complex Programmable Logic Devices (CPLDs) Application-Specific Integr...
A common approach to decreasing embedded application execution time is creating a homogeneous parall...
This paper describes VPF, a VLIW SIMD processor architecture developed to demonstrate the possibilit...
International audienceEmbedded systems present a tremendous opportunity to customize designs by expl...
Application-driven processor designs are becoming increasingly feasible. Today, advances in field-pr...
We discuss VThreads, a novel VLIW CMP with hardware-assisted shared-memory Thread support. VThreads ...
After more than 30 years, reconfigurable computing has grown from a concept to a mature field of scien...
Due to ever increasing complexity of circuits, EDA tools and algorithms are demanding more computati...
FPGA overlays have shown the potential to improve designers’ productivity through balancing flexibil...
In this article, we present experiences implementing a general Parallel Discrete Event Simulation (P...
Automated code generation and performance tuning tech-niques for concurrent architectures such as GP...
Many stand-alone, FPGA-based accelerators separate the implementation of a computation into two comp...
Spatial processing of sparse, irregular, double-precision floating-point computation using a single ...
Abstract—Single-FPGA spatial implementations can provide an order of magnitude speedup over sequenti...
ii Spatial processing of sparse, irregular floating-point computation using a single FPGA enables up...
Platform Multicore Processor, Complex Programmable Logic Devices (CPLDs) Application-Specific Integr...
A common approach to decreasing embedded application execution time is creating a homogeneous parall...
This paper describes VPF, a VLIW SIMD processor architecture developed to demonstrate the possibilit...
International audienceEmbedded systems present a tremendous opportunity to customize designs by expl...
Application-driven processor designs are becoming increasingly feasible. Today, advances in field-pr...
We discuss VThreads, a novel VLIW CMP with hardware-assisted shared-memory Thread support. VThreads ...
After more than 30 years, reconfigurable computing has grown from a concept to a mature field of scien...
Due to ever increasing complexity of circuits, EDA tools and algorithms are demanding more computati...
FPGA overlays have shown the potential to improve designers’ productivity through balancing flexibil...
In this article, we present experiences implementing a general Parallel Discrete Event Simulation (P...
Automated code generation and performance tuning tech-niques for concurrent architectures such as GP...